Bayesian Segmentation of Protein Secondary Structure

نویسندگان

  • Scott C. Schmidler
  • Jun S. Liu
  • Douglas L. Brutlag
چکیده

We present a novel method for predicting the secondary structure of a protein from its amino acid sequence. Most existing methods predict each position in turn based on a local window of residues, sliding this window along the length of the sequence. In contrast, we develop a probabilistic model of protein sequence/structure relationships in terms of structural segments, and formulate secondary structure prediction as a general Bayesian inference problem. A distinctive feature of our approach is the ability to develop explicit probabilistic models for alpha-helices, beta-strands, and other classes of secondary structure, incorporating experimentally and empirically observed aspects of protein structure such as helical capping signals, side chain correlations, and segment length distributions. Our model is Markovian in the segments, permitting efficient exact calculation of the posterior probability distribution over all possible segmentations of the sequence using dynamic programming. The optimal segmentation is computed and compared to a predictor based on marginal posterior modes, and the latter is shown to provide significant improvement in predictive accuracy. The marginalization procedure provides exact secondary structure probabilities at each sequence position, which are shown to be reliable estimates of prediction uncertainty. We apply this model to a database of 452 nonhomologous structures, achieving accuracies as high as the best currently available methods. We conclude by discussing an extension of this framework to model nonlocal interactions in protein structures, providing a possible direction for future improvements in secondary structure prediction accuracy.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Protein Secondary Structure Prediction: a Literature Review with Focus on Machine Learning Approaches

DNA sequence, containing all genetic traits is not a functional entity. Instead, it transfers to protein sequences by transcription and translation processes. This protein sequence takes on a 3D structure later, which is a functional unit and can manage biological interactions using the information encoded in DNA. Every life process one can figure is undertaken by proteins with specific functio...

متن کامل

FTIR Investigation of Secondary Structure of Reteplase Inclusion Bodies Produced in Escherichia coli in Terms of Urea Concentration

Recent studies suggest that reducing the induction temperature would improve the quality of some recombinant inclusion bodies (IB) by providing a native-like secondary structure and leading to an improvement in protein recovery. This study focused on optimizing the solubilization condition of Reteplase, a recombinant protein with 9 disulfide bonds. The influence of lowering induction temperatur...

متن کامل

FTIR Investigation of Secondary Structure of Reteplase Inclusion Bodies Produced in Escherichia coli in Terms of Urea Concentration

Recent studies suggest that reducing the induction temperature would improve the quality of some recombinant inclusion bodies (IB) by providing a native-like secondary structure and leading to an improvement in protein recovery. This study focused on optimizing the solubilization condition of Reteplase, a recombinant protein with 9 disulfide bonds. The influence of lowering induction temperatur...

متن کامل

Protein secondary structure detection based on unsupervised word segmentation

Unsupervised word segmentation methods were applied to analyze protein sequences. Protein sequences, such as “MTMDKSELVQKA...,” were used as input to these methods. Segmented “protein word” sequences, such as “MTM DKSE LVQKA,” were then obtained. We compared the “protein words” derived via unsupervised segmentation and protein secondary structure segmentation. An interesting finding is that uns...

متن کامل

Prediction of Secondary Structure of Citrus Viroids Reported from Southern Iran

Abstract Viroids are smallest, single-stranded, circular, highly structured plant pathogenic RNAs that do not code for any protein. Viroids belong to two families, the Avsunviroidae and the Pospiviroidae. Members of the Pospiviroidae family adopt a rod-like secondary structure. In this study the most stable secondary structures of citrus viroid variants that reported from Fars province wer...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • Journal of computational biology : a journal of computational molecular cell biology

دوره 7 1-2  شماره 

صفحات  -

تاریخ انتشار 2000